Search CORE

36 research outputs found

A novel dimensionality reduction technique based on independent component analysis for modeling microarray gene expression data

Author: Kustra Rafal
Liu Han
Zhang Ji
Publication venue: CSREA Press
Publication date: 01/01/2004
Field of study

DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. But one challenge of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. This is known as the ”curse of dimensionality problem”. An efficient way to solve this problem is by using dimensionality reduction techniques. Principle Component Analysis(PCA) is a leading method for dimensionality reduction of gene expression data which is optimal in the sense of least square error. In this paper we propose a new dimensionality reduction technique for specific bioinformatics applications based on Independent component Analysis(ICA). Being able to exploit higher order statistics to identify a linear model result, this ICA based dimensionality reduction technique outperforms PCA from both statistical and biological significance aspects. We present experiments on NCI 60 dataset to show this result

University of Southern Queensland ePrints

A factor analysis model for functional genomics

Author: Kustra Rafal
Shioda Romy
Zhu Mu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories. RESULTS: We propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance. CONCLUSION: Our factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions

University of Toronto Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

CFH and ARMS2 genetic risk determines progression to neovascular age-related macular degeneration after antioxidant and zinc supplementation

Author: Awh Carl C.
Kustra Rafal
Small Kent W.
Tibshirani Robert J.
Vavvas Demetrios G.
Zanke Brent W.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 20/03/2018
Field of study

We evaluated the influence of an antioxidant and zinc nutritional supplement [the Age-Related Eye Disease Study (AREDS) formulation] on delaying or preventing progression to neovascular AMD (NV) in persons with age-related macular degeneration (AMD). AREDS subjects (n = 802) with category 3 or 4 AMD at baseline who had been treated with placebo or the AREDS formulation were evaluated for differences in the risk of progression to NV as a function of complement factor H (CFH) and age-related maculopathy susceptibility 2 (ARMS2) genotype groups. We used published genetic grouping: a two-SNP haplotype risk-calling algorithm to assess CFH, and either the single SNP rs10490924 or 372_815del443ins54 to mark ARMS2 risk. Progression risk was determined using the Cox proportional hazard model. Genetics–treatment interaction on NV risk was assessed using a multiiterative bootstrap validation analysis. We identified strong interaction of genetics with AREDS formulation treatment on the development of NV. Individuals with high CFH and no ARMS2 risk alleles and taking the AREDS formulation had increased progression to NV compared with placebo. Those with low CFH risk and high ARMS2 risk had decreased progression risk. Analysis of CFH and ARMS2 genotype groups from a validation dataset reinforces this conclusion. Bootstrapping analysis confirms the presence of a genetics–treatment interaction and suggests that individual treatment response to the AREDS formulation is largely determined by genetics. The AREDS formulation modifies the risk of progression to NV based on individual genetics. Its use should be based on patient-specific genotype

Harvard University - DASH

The utility and predictive value of combinations of low penetrance genes for screening and risk prediction of colorectal cancer

Author: A Ahlbom
A Tenesa
ACJW Janssens
ACJW Janssens
ACJW Janssens
AD Müller
ADL Chapelle
American College of Medicine Genetics. Board of Directors
B Williams-Jones
BE Sirovich
Brent W. Zanke
BS Ling
BW Zanke
Celia M. T. Greenwood
D Goldstein
DC Thomas
DF Ransohoff
DK Rex
DM Gertig
E Jaeger
F Ramji
FE Harrell
I Tomlinson
JA Hanley
JB Simon
JE Allison
John McLaughlin
Julian Little
JV Selby
K Chen
L Aaltonen
L Madlensky
L Madlensky
L Rabeneck
L Sharp
MJ Khoury
ML Slattery
ML Slattery
MM Jong de
MT Mandelson
Multicentre Australian Colorectal-neoplasia Screening (MACS) Group
O Kronborg
P Autier
P Broderick
P Lichtenstein
P Moayyedi
P Vineis
PA Bampton
PA Newcomb
PD Pharoah
PDP Pharoah
Q Yang
Q Yang
Quanhe Yang
Rafal Kustra
RE Schabas
RS Houlston
RS Houlston
S Browning
S Gray
S Küry
SE Gollust
Steven J. Hawken
T Caulfield
Thomas J. Hudson
UK Flexible Sigmoidoscopy Screening Trial Investigators
UK Trial of Early Detection of Breast Cancer Group
W Atkin
Z Kemp
Publication venue: Springer-Verlag
Publication date: 01/01/2010
Field of study

Despite the fact that colorectal cancer (CRC) is a highly treatable form of cancer if detected early, a very low proportion of the eligible population undergoes screening for this form of cancer. Integrating a genomic screening profile as a component of existing screening programs for CRC could potentially improve the effectiveness of population screening by allowing the assignment of individuals to different types and intensities of screening and also by potentially increasing the uptake of existing screening programs. We evaluated the utility and predictive value of genomic profiling as applied to CRC, and as a potential component of a population-based cancer screening program. We generated simulated data representing a typical North American population including a variety of genetic profiles, with a range of relative risks and prevalences for individual risk genes. We then used these data to estimate parameters characterizing the predictive value of a logistic regression model built on genetic markers for CRC. Meta-analyses of genetic associations with CRC were used in building science to inform the simulation work, and to select genetic variants to include in logistic regression model-building using data from the ARCTIC study in Ontario, which included 1,200 CRC cases and a similar number of cancer-free population-based controls. Our simulations demonstrate that for reasonable assumptions involving modest relative risks for individual genetic variants, that substantial predictive power can be achieved when risk variants are common (e.g., prevalence > 20%) and data for enough risk variants are available (e.g., ~140–160). Pilot work in population data shows modest, but statistically significant predictive utility for a small collection of risk variants, smaller in effect than age and gender alone in predicting an individual’s CRC risk. Further genotyping and many more samples will be required, and indeed the discovery of many more risk loci associated with CRC before the question of the potential utility of germline genomic profiling can be definitively answered

Crossref

Springer - Publisher Connector

PubMed Central

Soft decision trees

Author: Kustra Rafal
Publication venue
Publication date: 01/01/1997
Field of study

grantor: University of TorontoSoft Decision Trees (SDT's) are a new class of semi-parametric methods for classification and regression. They attempt to retain the features that made tree-like techniques widely popular (interpretability, graphical summary of the result, automatic variable selection and interaction detection, etc.) while improving their predictive performance and making the model more believable. This is done by employing "soft", or stochastic splits which result in blurred partition boundaries and a continuous prediction surface. The parameters are fitted via Maximum Likelihood, using the EM algorithm. Simulation experiments indicate that the SDT's are indeed more powerful predictors. Real data analysis shows that SDT's can also aid in interpretation.M.Sc

University of Toronto Research Repository

Statistical analysis of medical images with applications to neuroimaging

Author: Kustra Rafal
Publication venue
Publication date: 01/01/2000
Field of study

grantor: University of TorontoWe extend a classical multivariate technique: Linear Discriminant Analysis (LDA) and apply it in the analysis of PET and fMRI images of human brain function to discover regions of activation driven by the experimental stimuli. We re-examine and specialize some equivalences between LDA and: Canonical Correlation Analysis (CCA) and Multivariate ANOVA (MANOVA). Furthermore, efficient algorithms are derived to facilitate applying these multivariate models to extremely large image data. We deal with the ill-posed nature of the problem using spatial basis expansion and the penalization (with Penalized Discriminant Analysis (PDA) of Hastie et al. (1995)), and utilize efficient measures of predictive performance to optimize hyperparameters and validate the models in a robust fashion. We examine expanding the images into a 3D tensor-product B-spline and Wavelet basis and compare to the results obtained without expansion. Some parallels between our proposal and some of those currently popular in the neuroimage community are discussed. Another extension to PDA is derived and applied that allows one to model time series effects that exist in fMRI images. We conclude with many possible enhancements to the proposed paradigm.Ph.D

University of Toronto Research Repository

Reduced-Rank Multivariate Model for Time-Course Microarray Data

Author: Rafal Kustra
Publication venue
Publication date
Field of study

Abstract: In this paper we present a novel, multi-gene approach to time course microarray experiments. One of the advantages of our approach is an explicit modeling of correlation structure among gene expression data. The approach proposed is computationally attractive. We apply the model to the well-known cell-cycle yeast microarray data and present results that compare favorably to the results of the previous studies

CiteSeerX